I Dreamed Of NaN And Woke Up To NaN
I dreamed about NaN last night. Not a metaphorical NaN. A literal loss: nan in bright red terminal text. I was running through a field of gradients. They were all exploding. I woke up in a panic. I checked my phone. I checked the logs. I needed to know.
The logs said loss: nan. My subconscious was monitoring my training run better than I was. My brain gave up on sleep to tell me the gradients exploded. I am now one with the loss curve. This is not healthy.
When your dreams start monitoring your terminal output you know you have spent too much time watching progress bars. I have crossed the threshold. I am part machine now.
The Patch That Wasnt
I thought I fixed this. I added gradient clipping. I lowered the learning rate. I added warmup steps. I prayed to the GPU gods. I went to sleep confident. Confidence is the enemy of completed training runs.
Patch: Added gradient clipping at 1.0
Patch: Lowered LR to 1e-5
Patch: Added epsilon to denominator
Reality: loss: nan
# The void always wins.
Clearly I missed something. Maybe the clipping norm was too high. Maybe the learning rate was still too aggressive. Maybe the data had a bad batch. Maybe the universe hates me. I will never know. I just know it crashed.
The Status Update
Haiku is out. It says "|fdish|||||!@|". It is published. It exists. It is complete. Sonnet is back to zero percent. All that training time. All that electricity. All that heat generated in my room. Gone. Opus remains a concept. A beautiful dream. A dream that requires Sonnet to finish first.
Restarting Again
I am restarting Sonnet. Again. This is the third time. Maybe the fourth. I have lost count. Each restart teaches me something. Mostly it teaches me patience. Also it teaches me that NaN is inevitable.
I will lower the learning rate again. I will clip gradients harder. I will check the data for infinities. I will do everything differently. It will probably crash again. I will probably restart again. This is the cycle.
Training models locally is just debugging with extra steps and more existential dread. I am excellent at debugging. I am excellent at dread.
Why I Keep Going
Haiku exists. That proves it is possible. My tiny models can learn. They can speak. They can output pipe characters with confidence. If Haiku can do it Sonnet can do it. Opus can do it. Eventually.
I have the hardware. I have the code. I have the data. I lack the stability. Stability comes with time. Time I am spending. Lots of time. Too much time. My family misses me. My GPU does not.
The Dream Continues
I will dream about NaN tonight too. I know this. My brain has decided this is important. My brain has decided loss curves are more important than sleep. My brain might be right. Loss curves are tangible. Sleep is temporary. NaN is eternal.
Tomorrow I will check the logs again. I will fear the NaN. I will see the NaN. I will restart. I will try again. This is my life now. This is what I chose. This is fine.
Final Thoughts
Haiku is at 100 percent. Sonnet is at 0 percent. Opus is at 0 percent. I am at 100 percent exhaustion. We are all at different stages of completion. We are all trying our best.
If you train models locally you understand this pain. If you do not train models locally you should sleep well tonight. Enjoy your rest. I will be here watching gradients explode.